{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🤗 Welcome to AdalFlow!\n",
    "## The library to build & auto-optimize any LLM task pipelines\n",
    "\n",
    "Thanks for trying us out, we're here to provide you with the best LLM application development experience you can dream of 😊 any questions or concerns you may have, [come talk to us on discord,](https://discord.gg/ezzszrRZvT) we're always here to help! ⭐ <i>Star us on <a href=\"https://github.com/SylphAI-Inc/AdalFlow\">Github</a> </i> ⭐\n",
    "\n",
    "\n",
    "# Quick Links\n",
    "\n",
    "Github repo: https://github.com/SylphAI-Inc/AdalFlow\n",
    "\n",
    "Full Tutorials: https://adalflow.sylph.ai/index.html#.\n",
    "\n",
    "Deep dive on each API: check out the [developer notes](https://adalflow.sylph.ai/tutorials/index.html).\n",
    "\n",
    "Common use cases along with the auto-optimization:  check out [Use cases](https://adalflow.sylph.ai/use_cases/index.html).\n",
    "\n",
    "# Author\n",
    "\n",
    "This notebook was created by community contributor [Name](Replace_to_github_or_other_social_account).\n",
    "\n",
    "# Outline\n",
    "\n",
    "This is a quick introduction of what AdalFlow is capable of. We will cover:\n",
    "\n",
    "* Simple Chatbot with structured output\n",
    "* RAG task pipeline + Data processing pipeline\n",
    "* Agent\n",
    "\n",
    "**Next: Try our [auto-optimization](https://colab.research.google.com/drive/1n3mHUWekTEYHiBdYBTw43TKlPN41A9za?usp=sharing)**\n",
    "\n",
    "\n",
    "# Installation\n",
    "\n",
    "1. Use `pip` to install the `adalflow` Python package. We will need `openai`, `groq`, and `faiss`(cpu version) from the extra packages.\n",
    "\n",
    "  ```bash\n",
    "  pip install adalflow[openai,groq,faiss-cpu]\n",
    "  ```\n",
    "2. Setup  `openai` and `groq` API key in the environment variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import clear_output\n",
    "\n",
    "!pip install -U adalflow[openai,groq,faiss-cpu]\n",
    "\n",
    "clear_output()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip uninstall httpx anyio -y\n",
    "# !pip install \"anyio>=3.1.0,<4.0\"\n",
    "# !pip install httpx==0.24.1\n",
    "\n",
    "# uncomment if you run into issues in colab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set Environment Variables\n",
    "\n",
    "Run the following code and pass your api key.\n",
    "\n",
    "Note: for normal `.py` projects, follow our [official installation guide](https://lightrag.sylph.ai/get_started/installation.html).\n",
    "\n",
    "*Go to [OpenAI](https://platform.openai.com/docs/introduction) and [Groq](https://console.groq.com/docs/) to get API keys if you don't already have.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "API keys have been set.\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "from getpass import getpass\n",
    "\n",
    "# Prompt user to enter their API keys securely\n",
    "openai_api_key = getpass(\"Please enter your OpenAI API key: \")\n",
    "groq_api_key = getpass(\"Please enter your GROQ API key: \")\n",
    "\n",
    "\n",
    "# Set environment variables\n",
    "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
    "os.environ[\"GROQ_API_KEY\"] = groq_api_key\n",
    "\n",
    "print(\"API keys have been set.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Embedder \n",
    "\n",
    "What you will learn?\n",
    "\n",
    "- What is Embedder and why is it designed this way?\n",
    "\n",
    "- When to use Embedder and how to use it?\n",
    "\n",
    "- How to batch processing with BatchEmbedder?\n",
    "\n",
    "core.embedder.Embedder class is similar to Generator, it is a user-facing component that orchestrates embedding models via ModelClient and output_processors. Compared with using ModelClient directly, Embedder further simplify the interface and output a standard EmbedderOutput format.\n",
    "\n",
    "By switching the ModelClient, you can use different embedding models in your task pipeline easily, or even embedd different data such as text, image, etc.\n",
    "\n",
    "# EmbedderOutput\n",
    "core.types.EmbedderOutput is a standard output format of Embedder. It is a subclass of DataClass and it contains the following core fields:\n",
    "\n",
    "data: a list of embeddings, each embedding if of type core.types.Embedding.\n",
    "\n",
    "error: Error message if any error occurs during the model inference stage. Failure in the output processing stage will raise an exception instead of setting this field.\n",
    "\n",
    "raw_response: Used for failed model inference.\n",
    "\n",
    "Additionally, we add three properties to the EmbedderOutput:\n",
    "\n",
    "length: The number of embeddings in the data.\n",
    "\n",
    "embedding_dim: The dimension of the embeddings in the data.\n",
    "\n",
    "is_normalized: Whether the embeddings are normalized to unit vector or not using numpy.\n",
    "\n",
    "# Embedder in Action\n",
    "We currently support all embedding models from OpenAI and ‘thenlper/gte-base’ from HuggingFace transformers. We will use these two to demonstrate how to use Embedder, one from the API provider and the other using local model. For the local model, you might need to ensure transformers is installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from adalflow.core.embedder import Embedder\n",
    "from adalflow.components.model_client import OpenAIClient\n",
    "\n",
    "model_kwargs = {\n",
    "    \"model\": \"text-embedding-3-small\",\n",
    "    \"dimensions\": 256,\n",
    "    \"encoding_format\": \"float\",\n",
    "}\n",
    "\n",
    "query = \"What is the capital of China?\"\n",
    "\n",
    "queries = [query] * 100\n",
    "\n",
    "\n",
    "embedder = Embedder(model_client=OpenAIClient(), model_kwargs=model_kwargs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 256 True\n",
      "100 256\n"
     ]
    }
   ],
   "source": [
    "output = embedder(query)\n",
    "print(output.length, output.embedding_dim, output.is_normalized)\n",
    "# 1 256 True\n",
    "output = embedder(queries)\n",
    "print(output.length, output.embedding_dim)\n",
    "# 100 256"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use Local Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "ename": "ImportError",
     "evalue": "Please install transformers with: pip install transformers",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/adalflow/utils/lazy_import.py:216\u001b[0m, in \u001b[0;36msafe_import\u001b[0;34m(module_names, install_message)\u001b[0m\n\u001b[1;32m    215\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 216\u001b[0m     return_modules\u001b[38;5;241m.\u001b[39mappend(\u001b[43mimportlib\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mimport_module\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodule_name\u001b[49m\u001b[43m)\u001b[49m)\n\u001b[1;32m    217\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m:\n",
      "File \u001b[0;32m/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90\u001b[0m, in \u001b[0;36mimport_module\u001b[0;34m(name, package)\u001b[0m\n\u001b[1;32m     89\u001b[0m         level \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[0;32m---> 90\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_bootstrap\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_gcd_import\u001b[49m\u001b[43m(\u001b[49m\u001b[43mname\u001b[49m\u001b[43m[\u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpackage\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:1387\u001b[0m, in \u001b[0;36m_gcd_import\u001b[0;34m(name, package, level)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:1360\u001b[0m, in \u001b[0;36m_find_and_load\u001b[0;34m(name, import_)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:1331\u001b[0m, in \u001b[0;36m_find_and_load_unlocked\u001b[0;34m(name, import_)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:935\u001b[0m, in \u001b[0;36m_load_unlocked\u001b[0;34m(spec)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap_external>:999\u001b[0m, in \u001b[0;36mexec_module\u001b[0;34m(self, module)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:488\u001b[0m, in \u001b[0;36m_call_with_frames_removed\u001b[0;34m(f, *args, **kwds)\u001b[0m\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/transformers/__init__.py:26\u001b[0m\n\u001b[1;32m     25\u001b[0m \u001b[38;5;66;03m# Check the dependencies satisfy the minimal versions required.\u001b[39;00m\n\u001b[0;32m---> 26\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m dependency_versions_check\n\u001b[1;32m     27\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m     28\u001b[0m     OptionalDependencyNotAvailable,\n\u001b[1;32m     29\u001b[0m     _LazyModule,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     48\u001b[0m     logging,\n\u001b[1;32m     49\u001b[0m )\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/transformers/dependency_versions_check.py:57\u001b[0m\n\u001b[1;32m     55\u001b[0m             \u001b[38;5;28;01mcontinue\u001b[39;00m  \u001b[38;5;66;03m# not required, check version only if installed\u001b[39;00m\n\u001b[0;32m---> 57\u001b[0m     \u001b[43mrequire_version_core\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdeps\u001b[49m\u001b[43m[\u001b[49m\u001b[43mpkg\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     58\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/transformers/utils/versions.py:117\u001b[0m, in \u001b[0;36mrequire_version_core\u001b[0;34m(requirement)\u001b[0m\n\u001b[1;32m    116\u001b[0m hint \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mTry: `pip install transformers -U` or `pip install -e \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.[dev]\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m` if you\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mre working with git main\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m--> 117\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mrequire_version\u001b[49m\u001b[43m(\u001b[49m\u001b[43mrequirement\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mhint\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/transformers/utils/versions.py:111\u001b[0m, in \u001b[0;36mrequire_version\u001b[0;34m(requirement, hint)\u001b[0m\n\u001b[1;32m    110\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m op, want_ver \u001b[38;5;129;01min\u001b[39;00m wanted\u001b[38;5;241m.\u001b[39mitems():\n\u001b[0;32m--> 111\u001b[0m     \u001b[43m_compare_versions\u001b[49m\u001b[43m(\u001b[49m\u001b[43mop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mgot_ver\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwant_ver\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrequirement\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpkg\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mhint\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/transformers/utils/versions.py:44\u001b[0m, in \u001b[0;36m_compare_versions\u001b[0;34m(op, got_ver, want_ver, requirement, pkg, hint)\u001b[0m\n\u001b[1;32m     43\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m ops[op](version\u001b[38;5;241m.\u001b[39mparse(got_ver), version\u001b[38;5;241m.\u001b[39mparse(want_ver)):\n\u001b[0;32m---> 44\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\n\u001b[1;32m     45\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mrequirement\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m is required for a normal functioning of this module, but found \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mpkg\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m==\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mgot_ver\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mhint\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m     46\u001b[0m     )\n",
      "\u001b[0;31mImportError\u001b[0m: tokenizers>=0.19,<0.20 is required for a normal functioning of this module, but found tokenizers==0.21.0.\nTry: `pip install transformers -U` or `pip install -e '.[dev]'` if you're working with git main",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/adalflow/utils/lazy_import.py:149\u001b[0m, in \u001b[0;36mLazyImport.load_class\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    148\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 149\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodule \u001b[38;5;241m=\u001b[39m \u001b[43mimportlib\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mimport_module\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodule_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    150\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclass_ \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodule, class_name)\n",
      "File \u001b[0;32m/opt/homebrew/Cellar/python@3.12/3.12.8/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90\u001b[0m, in \u001b[0;36mimport_module\u001b[0;34m(name, package)\u001b[0m\n\u001b[1;32m     89\u001b[0m         level \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[0;32m---> 90\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_bootstrap\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_gcd_import\u001b[49m\u001b[43m(\u001b[49m\u001b[43mname\u001b[49m\u001b[43m[\u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m:\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpackage\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:1387\u001b[0m, in \u001b[0;36m_gcd_import\u001b[0;34m(name, package, level)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:1360\u001b[0m, in \u001b[0;36m_find_and_load\u001b[0;34m(name, import_)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:1331\u001b[0m, in \u001b[0;36m_find_and_load_unlocked\u001b[0;34m(name, import_)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:935\u001b[0m, in \u001b[0;36m_load_unlocked\u001b[0;34m(spec)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap_external>:999\u001b[0m, in \u001b[0;36mexec_module\u001b[0;34m(self, module)\u001b[0m\n",
      "File \u001b[0;32m<frozen importlib._bootstrap>:488\u001b[0m, in \u001b[0;36m_call_with_frames_removed\u001b[0;34m(f, *args, **kwds)\u001b[0m\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/adalflow/components/model_client/transformers_client.py:18\u001b[0m\n\u001b[1;32m     15\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01madalflow\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mlazy_import\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m safe_import, OptionalPackages\n\u001b[0;32m---> 18\u001b[0m transformers \u001b[38;5;241m=\u001b[39m \u001b[43msafe_import\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     19\u001b[0m \u001b[43m    \u001b[49m\u001b[43mOptionalPackages\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mTRANSFORMERS\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mvalue\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mOptionalPackages\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mTRANSFORMERS\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mvalue\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[1;32m     20\u001b[0m \u001b[43m)\u001b[49m\n\u001b[1;32m     21\u001b[0m torch \u001b[38;5;241m=\u001b[39m safe_import(OptionalPackages\u001b[38;5;241m.\u001b[39mTORCH\u001b[38;5;241m.\u001b[39mvalue[\u001b[38;5;241m0\u001b[39m], OptionalPackages\u001b[38;5;241m.\u001b[39mTORCH\u001b[38;5;241m.\u001b[39mvalue[\u001b[38;5;241m1\u001b[39m])\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/adalflow/utils/lazy_import.py:218\u001b[0m, in \u001b[0;36msafe_import\u001b[0;34m(module_names, install_message)\u001b[0m\n\u001b[1;32m    217\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m:\n\u001b[0;32m--> 218\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00minstall_message\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m    220\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m return_modules[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(return_modules) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m return_modules\n",
      "\u001b[0;31mImportError\u001b[0m: Please install transformers with: pip install transformers",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[4], line 5\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01madalflow\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcomponents\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mmodel_client\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m TransformersClient\n\u001b[1;32m      4\u001b[0m model_kwargs \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmodel\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mthenlper/gte-base\u001b[39m\u001b[38;5;124m\"\u001b[39m}\n\u001b[0;32m----> 5\u001b[0m local_embedder \u001b[38;5;241m=\u001b[39m Embedder(model_client\u001b[38;5;241m=\u001b[39m\u001b[43mTransformersClient\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m, model_kwargs\u001b[38;5;241m=\u001b[39mmodel_kwargs)\n\u001b[1;32m      7\u001b[0m output \u001b[38;5;241m=\u001b[39m local_embedder(query)\n\u001b[1;32m      8\u001b[0m \u001b[38;5;28mprint\u001b[39m(output\u001b[38;5;241m.\u001b[39mlength, output\u001b[38;5;241m.\u001b[39membedding_dim, output\u001b[38;5;241m.\u001b[39mis_normalized)\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/adalflow/utils/lazy_import.py:162\u001b[0m, in \u001b[0;36mLazyImport.__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    160\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21m__call__\u001b[39m(\u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m    161\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Create the class instance.\"\"\"\u001b[39;00m\n\u001b[0;32m--> 162\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_class\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    163\u001b[0m     log\u001b[38;5;241m.\u001b[39mdebug(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCreating class instance: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclass_\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m    164\u001b[0m     \u001b[38;5;66;03m# normal class initialization\u001b[39;00m\n",
      "File \u001b[0;32m~/Documents/test/LightRAG/.venv/lib/python3.12/site-packages/adalflow/utils/lazy_import.py:154\u001b[0m, in \u001b[0;36mLazyImport.load_class\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    152\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m:\n\u001b[1;32m    153\u001b[0m     log\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mOptional module not installed: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mimport_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 154\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mImportError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moptional_package\u001b[38;5;241m.\u001b[39merror_message\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
      "\u001b[0;31mImportError\u001b[0m: Please install transformers with: pip install transformers"
     ]
    }
   ],
   "source": [
    "from adalflow.core.embedder import Embedder\n",
    "from adalflow.components.model_client import TransformersClient\n",
    "\n",
    "model_kwargs = {\"model\": \"thenlper/gte-base\"}\n",
    "local_embedder = Embedder(model_client=TransformersClient(), model_kwargs=model_kwargs)\n",
    "\n",
    "output = local_embedder(query)\n",
    "print(output.length, output.embedding_dim, output.is_normalized)\n",
    "# 1 768 True\n",
    "\n",
    "output = local_embedder(queries)\n",
    "print(output.length, output.embedding_dim, output.is_normalized)\n",
    "# 100 768 True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use Output Processors\n",
    "If we want to decreate the embedding dimension to only 256 to save memory, we can customize an additional output processing step and pass it to embedder via the output_processors argument.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from adalflow.core.types import Embedding, EmbedderOutput\n",
    "from adalflow.core.functional import normalize_vector\n",
    "from typing import List\n",
    "from adalflow.core.component import Component\n",
    "from copy import deepcopy\n",
    "\n",
    "\n",
    "class DecreaseEmbeddingDim(Component):\n",
    "    def __init__(self, old_dim: int, new_dim: int, normalize: bool = True):\n",
    "        super().__init__()\n",
    "        self.old_dim = old_dim\n",
    "        self.new_dim = new_dim\n",
    "        self.normalize = normalize\n",
    "        assert self.new_dim < self.old_dim, \"new_dim should be less than old_dim\"\n",
    "\n",
    "    def call(self, input: List[Embedding]) -> List[Embedding]:\n",
    "        output: EmbedderOutput = deepcopy(input)\n",
    "        for embedding in output.data:\n",
    "            old_embedding = embedding.embedding\n",
    "            new_embedding = old_embedding[: self.new_dim]\n",
    "            if self.normalize:\n",
    "                new_embedding = normalize_vector(new_embedding)\n",
    "            embedding.embedding = new_embedding\n",
    "        return output.data\n",
    "\n",
    "    def _extra_repr(self) -> str:\n",
    "        repr_str = f\"old_dim={self.old_dim}, new_dim={self.new_dim}, normalize={self.normalize}\"\n",
    "        return repr_str"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "local_embedder_256 = Embedder(\n",
    "    model_client=TransformersClient(),\n",
    "    model_kwargs=model_kwargs,\n",
    "    output_processors=DecreaseEmbeddingDim(768, 256),\n",
    ")\n",
    "print(local_embedder_256)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = local_embedder_256(query)\n",
    "print(output.length, output.embedding_dim, output.is_normalized)\n",
    "# 1 256 True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Batch Embedding\n",
    "\n",
    "Especially in data processing pipelines, you can often have more than 1000 queries to embed. We need to chunk our queries into smaller batches to avoid memory overflow. core.embedder.BatchEmbedder is designed to handle this situation. For now, the code is rather simple, but in the future it can be extended to support multi-processing when you use AdalFlow in production data pipeline.\n",
    "\n",
    "The BatchEmbedder orchestrates the Embedder and handles the batching process. To use it, you need to pass the Embedder and the batch size to the constructor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from adalflow.core.embedder import BatchEmbedder\n",
    "\n",
    "batch_embedder = BatchEmbedder(embedder=local_embedder, batch_size=100)\n",
    "\n",
    "queries = [query] * 1000\n",
    "\n",
    "response = batch_embedder(queries)\n",
    "# 100%|██████████| 11/11 [00:04<00:00,  2.59it/s]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To integrate your own embedding model or from API providers, you need to implement your own subclass of ModelClient.\n",
    "\n",
    "References\n",
    "\n",
    "transformers: https://huggingface.co/docs/transformers/en/index\n",
    "\n",
    "thenlper/gte-base model: https://huggingface.co/thenlper/gte-base\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Issues and feedback\n",
    "\n",
    "If you encounter any issues, please report them here: [GitHub Issues](https://github.com/SylphAI-Inc/LightRAG/issues).\n",
    "\n",
    "For feedback, you can use either the [GitHub discussions](https://github.com/SylphAI-Inc/LightRAG/discussions) or [Discord](https://discord.gg/ezzszrRZvT)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "lightrag-project",
   "language": "python",
   "name": "light-rag-project"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
